AITopics | decoder-only model

Collaborating Authors

decoder-only model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Modeling Contextual Passage Utility for Multihop Question Answering

Jain, Akriti, Garimella, Aparna

arXiv.org Artificial IntelligenceDec-9-2025

Multihop Question Answering (QA) requires systems to identify and synthesize information from multiple text passages. While most prior retrieval methods assist in identifying relevant passages for QA, further assessing the utility of the passages can help in removing redundant ones, which may otherwise add to noise and inaccuracies in the generated answers. Existing utility prediction approaches model passage utility independently, overlooking a critical aspect of multihop reasoning: the utility of a passage can be context-dependent, influenced by its relation to other passages - whether it provides complementary information or forms a crucial link in conjunction with others. In this paper, we propose a lightweight approach to model contextual passage utility, accounting for inter-passage dependencies. We fine-tune a small transformer-based model to predict passage utility scores for multihop QA. We leverage the reasoning traces from an advanced reasoning model to capture the order in which passages are used to answer a question and obtain synthetic training data. Through comprehensive experiments, we demonstrate that our utility-based scoring of retrieved passages leads to improved reranking and downstream QA performance compared to relevance-based reranking methods.

large language model, machine learning, utility score, (17 more...)

arXiv.org Artificial Intelligence

2512.06464

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Comprehensive Evaluation on Lexical Normalization: Boundary-Aware Approaches for Unsegmented Languages

Higashiyama, Shohei, Utiyama, Masao

arXiv.org Artificial IntelligenceDec-2-2025

Lexical normalization research has sought to tackle the challenge of processing informal expressions in user-generated text, yet the absence of comprehensive evaluations leaves it unclear which methods excel across multiple perspectives. Focusing on unsegmented languages, we make three key contributions: (1) creating a large-scale, multi-domain Japanese normalization dataset, (2) developing normalization methods based on state-of-the-art pretrained models, and (3) conducting experiments across multiple evaluation perspectives. Our experiments show that both encoder-only and decoder-only approaches achieve promising results in both accuracy and efficiency.

computational linguistic, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2505.22273

Country:

Asia > Japan > Honshū (0.46)
North America > United States > Minnesota (0.28)
North America > United States > New Mexico (0.28)
Europe > Middle East > Malta (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.92)
Information Technology > Communications > Social Media (0.68)
(2 more...)

Add feedback

ViClaim: A Multilingual Multilabel Dataset for Automatic Claim Detection in Videos

Giedemann, Patrick, von Däniken, Pius, Deriu, Jan, Rodrigo, Alvaro, Peñas, Anselmo, Cieliebak, Mark

arXiv.org Artificial IntelligenceOct-13-2025

The growing influence of video content as a medium for communication and misinformation underscores the urgent need for effective tools to analyze claims in multilingual and multi-topic settings. Existing efforts in misinformation detection largely focus on written text, leaving a significant gap in addressing the complexity of spoken text in video transcripts. We introduce ViClaim, a dataset of 1,798 annotated video transcripts across three languages (English, German, Spanish) and six topics. Each sentence in the transcripts is labeled with three claim-related categories: fact-check-worthy, fact-non-check-worthy, or opinion. We developed a custom annotation tool to facilitate the highly complex annotation process. Experiments with state-of-the-art multilingual language models demonstrate strong performance in cross-validation (macro F1 up to 0.896) but reveal challenges in generalization to unseen topics, particularly for distinct domains. Our findings highlight the complexity of claim detection in video transcripts. ViClaim offers a robust foundation for advancing misinformation detection in video-based communication, addressing a critical gap in multimodal analysis.

annotator, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2504.12882

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (0.66)

Industry:

Media > News (0.91)
Government > Voting & Elections (0.68)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Decoding Partial Differential Equations: Cross-Modal Adaptation of Decoder-only Models to PDEs

García-de-Herreros, Paloma, Slusallek, Philipp, Klakow, Dietrich, Gautam, Vagrant

arXiv.org Artificial IntelligenceOct-8-2025

Large language models have shown great success on natural language tasks in recent years, but they have also shown great promise when adapted to new modalities, e.g., for scientific machine learning tasks. Even though decoder-only models are more popular within NLP and scale exceedingly well at generating natural language, most proposed approaches for cross-modal adaptation focus on encoder-only models, raising the question of how model architecture affects these approaches. In this paper, we therefore perform a series of ablation studies to answer this question, systematically comparing encoder-only and decoder-only models on cross-modal adaptation for time-dependent simulation tasks based on partial differential equations (PDEs). We find that decoder-only models are far worse than encoder-only models, when existing approaches are applied unmodified. In contrast to several other domains, scaling decoder-only models also does not help. To harness the potential of decoder-only models in this context, we introduce two novel approaches, Parallel Flipping and Sequence Doubling, attempting to mimic bidirectionality in autoregressive models. Both our methods improve overall performance using decoder-only models for all tasks and all cross-model adaptation methods, closing the gap to encoder-only model performance. We hope that our findings broaden the spectrum of models used on cross-modal adaptation tasks to further scientific ML.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.05278

Country: North America > United States (0.29)

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.48)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Better with Less: Small Proprietary Models Surpass Large Language Models in Financial Transaction Understanding

Ding, Wanying, Narendra, Savinay, Shi, Xiran, Ratnaparkhi, Adwait, Yang, Chengrui, Sabzevar, Nikoo, Yin, Ziyan

arXiv.org Artificial IntelligenceOct-1-2025

Analyzing financial transactions is crucial for ensuring regulatory compliance, detecting fraud, and supporting decisions. The complexity of financial transaction data necessitates advanced techniques to extract meaningful insights and ensure accurate analysis. Since Transformer-based models have shown outstanding performance across multiple domains, this paper seeks to explore their potential in understanding financial transactions. This paper conducts extensive experiments to evaluate three types of Transformer models: Encoder-Only, Decoder-Only, and Encoder-Decoder models. For each type, we explore three options: pretrained LLMs, fine-tuned LLMs, and small proprietary models developed from scratch. Our analysis reveals that while LLMs, such as LLaMA3-8b, Flan-T5, and SBERT, demonstrate impressive capabilities in various natural language processing tasks, they do not significantly outperform small proprietary models in the specific context of financial transaction understanding. This phenomenon is particularly evident in terms of speed and cost efficiency. Proprietary models, tailored to the unique requirements of transaction data, exhibit faster processing times and lower operational costs, making them more suitable for real-time applications in the financial sector. Our findings highlight the importance of model selection based on domain-specific needs and underscore the potential advantages of customized proprietary models over general-purpose LLMs in specialized applications. Ultimately, we chose to implement a proprietary decoder-only model to handle the complex transactions that we previously couldn't manage. This model can help us to improve 14% transaction coverage, and save more than \$13 million annual cost.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.25803

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Banking & Finance (1.00)
Law Enforcement & Public Safety > Fraud (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

"Where does it hurt?" -- Dataset and Study on Physician Intent Trajectories in Doctor Patient Dialogues

Röhr, Tom, Roy, Soumyadeep, Mohamad, Fares Al, Papaioannou, Jens-Michalis, Nejdl, Wolfgang, Gers, Felix, Löser, Alexander

arXiv.org Artificial IntelligenceAug-27-2025

In a doctor-patient dialogue, the primary objective of physicians is to diagnose patients and propose a treatment plan. Medical doctors guide these conversations through targeted questioning to efficiently gather the information required to provide the best possible outcomes for patients. To the best of our knowledge, this is the first work that studies physician intent trajectories in doctor-patient dialogues. We use the `Ambient Clinical Intelligence Benchmark' (Aci-bench) dataset for our study. We collaborate with medical professionals to develop a fine-grained taxonomy of physician intents based on the SOAP framework (Subjective, Objective, Assessment, and Plan). We then conduct a large-scale annotation effort to label over 5000 doctor-patient turns with the help of a large number of medical experts recruited using Prolific, a popular crowd-sourcing platform. This large labeled dataset is an important resource contribution that we use for benchmarking the state-of-the-art generative and encoder models for medical intent classification tasks. Our findings show that our models understand the general structure of medical dialogues with high accuracy, but often fail to identify transitions between SOAP categories. We also report for the first time common trajectories in medical dialogue structures that provide valuable insights for designing `differential diagnosis' systems. Finally, we extensively study the impact of intent filtering for medical dialogue summarization and observe a significant boost in performance. We make the codes and data, including annotation guidelines, publicly available at https://github.com/DATEXIS/medical-intent-classification.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2508.19077

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (0.69)
Health & Medicine > Health Care Technology > Medical Record (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Mutarjim: Advancing Bidirectional Arabic-English Translation with a Small Language Model

Hennara, Khalil, Hreden, Muhammad, Hamed, Mohamed Motaism, Aldallal, Zeina, Chrouf, Sara, AlModhayan, Safwan

arXiv.org Artificial IntelligenceAug-22-2025

We introduce Mutarjim, a compact yet powerful language model for bidirectional Arabic-English translation. While large-scale LLMs have shown impressive progress in natural language processing tasks, including machine translation, smaller models. Leveraging this insight, we developed Mutarjim based on Kuwain-1.5B , a language model tailored for both Arabic and English. Despite its modest size, Mutarjim outperforms much larger models on several established benchmarks, achieved through an optimized two-phase training approach and a carefully curated, high-quality training corpus.. Experimental results show that Mutarjim rivals models up to 20 times larger while significantly reducing computational costs and training requirements. We also introduce Tarjama-25, a new benchmark designed to overcome limitations in existing Arabic-English benchmarking datasets, such as domain narrowness, short sentence lengths, and English-source bias. Tarjama-25 comprises 5,000 expert-reviewed sentence pairs and spans a wide range of domains, offering a more comprehensive and balanced evaluation framework. Notably, Mutarjim achieves state-of-the-art performance on the English-to-Arabic task in Tarjama-25, surpassing even significantly larger and proprietary models like GPT-4o mini. We publicly release Tarjama-25 to support future research and advance the evaluation of Arabic-English translation systems.

large language model, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

2505.17894

Country:

Asia > Middle East (0.46)
Asia > Japan (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

What Language(s) Does Aya-23 Think In? How Multilinguality Affects Internal Language Representations

Trinley, Katharina, Nakai, Toshiki, Anikina, Tatiana, Baeumel, Tanja

arXiv.org Artificial IntelligenceJul-29-2025

Large language models (LLMs) excel at multilingual tasks, yet their internal language processing remains poorly understood. We analyze how Aya-23-8B, a decoder-only LLM trained on balanced multilingual data, handles code-mixed, cloze, and translation tasks compared to predominantly monolingual models like Llama 3 and Chinese-LLaMA-2. Using logit lens and neuron specialization analyses, we find: (1) Aya-23 activates typologically related language representations during translation, unlike English-centric models that rely on a single pivot language; (2) code-mixed neuron activation patterns vary with mixing rates and are shaped more by the base language than the mixed-in one; and (3) Aya-23's languagespecific neurons for code-mixed inputs concentrate in final layers, diverging from prior findings on decoder-only models. Neuron overlap analysis further shows that script similarity and typological relations impact processing across model types. These findings reveal how multilingual training shapes LLM internals and inform future cross-lingual transfer research.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.20279

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Seq vs Seq: An Open Suite of Paired Encoders and Decoders

Weller, Orion, Ricci, Kathryn, Marone, Marc, Chaffin, Antoine, Lawrie, Dawn, Van Durme, Benjamin

arXiv.org Artificial IntelligenceJul-16-2025

The large language model (LLM) community focuses almost exclusively on decoder-only language models, since they are easier to use for text generation. However, a large subset of the community still uses encoder-only models for tasks such as classification or retrieval. Previous work has attempted to compare these architectures, but is forced to make comparisons with models that have different numbers of parameters, training techniques, and datasets. We introduce the SOTA open-data Ettin suite of models: paired encoder-only and decoder-only models ranging from 17 million parameters to 1 billion, trained on up to 2 trillion tokens. Using the same recipe for both encoder-only and decoder-only models produces SOTA recipes in both categories for their respective sizes, beating ModernBERT as an encoder and Llama 3.2 and SmolLM2 as decoders. Like previous work, we find that encoder-only models excel at classification and retrieval tasks while decoders excel at generative tasks. However, we show that adapting a decoder model to encoder tasks (and vice versa) through continued training is subpar compared to using only the reverse objective (i.e. a 400M encoder outperforms a 1B decoder on MNLI, and vice versa for generative tasks). We open-source all artifacts of this study including training data, training order segmented by checkpoint, and 200+ checkpoints to allow future work to analyze or extend all aspects of training.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2507.11412

Country: North America > United States (0.93)

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From BERT to Qwen: Hate Detection across architectures

Mon, Ariadna, Fenollosa, Saúl, Lecumberri, Jon

arXiv.org Artificial IntelligenceJul-15-2025

Online platforms struggle to curb hate speech without over-censoring legitimate discourse. Early bidirectional transformer encoders made big strides, but the arrival of ultra-large autoregressive LLMs promises deeper context-awareness. Whether this extra scale actually improves practical hate-speech detection on real-world text remains unverified. Our study puts this question to the test by benchmarking both model families, classic encoders and next-generation LLMs, on curated corpora of online interactions for hate-speech detection (Hate or No Hate).

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2507.10468

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback